Clustering of gene expression profiles: creating initialization-independent clusterings by eliminating unstable genes

نویسندگان

  • Wim De Mulder
  • Martin Kuiper
  • René K. Boel
چکیده

Clustering is an important approach in the analysis of biological data, and often a first step to identify interesting patterns of coexpression in gene expression data. Because of the high complexity and diversity of gene expression data, many genes cannot be easily assigned to a cluster, but even if the dissimilarity of these genes with all other gene groups is large, they will finally be forced to become member of a cluster. In this paper we show how to detect such elements, called unstable elements. We have developed an approach for iterative clustering algorithms in which unstable elements are deleted, making the iterative algorithm less dependent on initial centers. Although the approach is unsupervised, it is less likely that the clusters into which the reduced data set is subdivided contain false positives. This clustering yields a more differentiated approach for biological data, since the cluster analysis is divided into two parts: the pruned data set is divided into highly consistent clusters in an unsupervised way and the removed, unstable elements for which no meaningful cluster exists in unsupervised terms can be given a cluster with the use of biological knowledge and information about the likelihood of cluster membership. We illustrate our framework on both an artificial and real biological data set.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Annotation-based Distance Measures for Patient Subgroup Discovery in Clinical Microarray Studies

MOTIVATION Clustering algorithms are widely used in the analysis of microarray data. In clinical studies, they are often applied to find groups of co-regulated genes. Clustering, however, can also stratify patients by similarity of their gene expression profiles, thereby defining novel disease entities based on molecular characteristics. Several distance-based cluster algorithms have been sugge...

متن کامل

Characterization of Gene Functional Expression Profiles of Plasmodium Falciparum

ABSTRACT Our objective was to provide functional characterization of gene expression related to Intraerythrocytic Developmental Cycle (IDC) of Plasmodium Falciparum. We explored a hypothesis that genes with same or similar function are likely to have similar expression profiles. Analysis of 1,051 Gene Ontology (GO) terms represented by at least two genes in Plasmodium Falciparum microarray data...

متن کامل

Evolutionary fuzzy cluster analysis with Bayesian validation of gene expression profiles

Clustering analysis of the gene expression profiles has been used for identifying the functions of unknown genes. Fuzzy clustering method, which is one category of clustering, assigns one sample to multiple clusters as their degrees of membership. It is more appropriate for analyzing gene expression profiles because genes usually belong to multiple functional families. However, general clusteri...

متن کامل

بررسی اثرات تغییر بیان ریز آر ان ای های سلولی ناشی از ویروس پاپیلوم انسانی در سلول های سرطانی سنگفرشی سر و گردن در سطح پروفیل بیان ژنی

Background and aim: Human Papilloma Virus plays an important role in some of human malignancies and causes alterations in normal expression levels of cellular microRNAs. In this paper, we evaluated the effects of such changes on Head and Neck Squamous Cell Carcinoma tumor samples at gene expression profile level. Methods: in this descriptive-analytical study, gene expression profiles of 36 tum...

متن کامل

Evolutionary Fuzzy Clustering Algorithm with Knowledge-Based Evaluation and Applications for Gene Expression Profiling

In microarray data analysis, clustering is a method that groups thousands of genes by their similarities of expression levels, helping to analyze gene expression profiles. This method has been used for identifying unknown functions of genes. The fuzzy clustering method assigns one sample to multiple groups according to their degrees of membership. This method is more appropriate for analyzing g...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Journal of integrative bioinformatics

دوره 7 3  شماره 

صفحات  -

تاریخ انتشار 2010